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Abstract. Metropolis simulations of all-atom models of peptides (i.e. small proteins) are consid- 
ered. Inspired by the funnel picture of Bryngelson and Wolyness, a transformation of the updating 
probabilities of the dihedral angles is defined, which uses probability densities from a higher tem- 
perature to improve the algorithmic performance at a lower temperature. The method is suitable for 
canonical as well as for generalized ensemble simulations. A simple approximation to the full trans- 
formation is tested at room temperature for Met-Enkephalin in vacuum. Integrated autocorrelation 
times are found to be reduced by factors close to two and a similar improvement due to generalized 
ensemble methods enters multiplicatively. 



INTRODUCTION 

Reliable simulations of biomolecules are one of nowadays grand challenges of compu- 
tational science. In particular the problem of protein folding thermodynamics starting 
purely from an aminino-acid sequence has received major attention. Until recently the 
prevailing view has been that it is elusive to search for the native states with present day 
simulational techniques due to limitations of time scale and force field accuracy. Now 
a barrier appears to be broken, as it was reported [1] that large scale distributed com- 
puting allows to achieve folding of the 23-residue design mini-protein BBA5, which is 
relatively insensitive to inaccuracies of the force field. The relaxation dynamics of the 
computer simulation is found to be in good agreement with the experimentally observed 
folding times and equilibrium constants. 

Molecular dynamics (MD) technics, for a review see 10], tend to be the method of first 
choice for simulations of biomolecules. One of the attractive features of MD is that it 
allows to follow the physical time evolution of the system under investigation. Neverthe- 
less, there has also been activity based on the Metropolis method O, which allows only 
for limited dynamical insights and has its strength in the generation of configurations 
which are in thermodynamical equilibrium. A major advantage of Metropolis simula- 
tions of biomolecules is that they allows for updates which are large moves when one 
has to follow dynamical trajectories. Such updates may help to overcome the kinetic 
trapping problem, which is due to a large number of local minima in the free energy 
space of typical biomolecules. Already Metropolis simulations of the canonical Gibbs- 
Boltzmann ensemble may jump certain barriers. Using generalized ensembles, which 
enlarge the Gibbs-Boltzmann ensemble and/or replace the canonical weights by other 



weighting factors, further progress has been made. For reviews see 

To my knowledge umbrella sampling ||6|] was the first generalized ensemble method 
of the literature. Its potential for applications to a wide range of interesting physical 
problems remained for a long time dormant. Apparently, one reason was that the com- 
putational scientists in various areas shied away from performing simulations with an 
a-priori unknown weighting factor. As Li and Scheraga put it 02D: "The difficulty of 
finding such a weighting factor has prevented wide applications of the "umbrella sam- 
pling" method to many physical systems." This changed with the introduction of the 
multicanonical approach |0] to complex systems |0 EH [HI]. Besides having the luck 
that the controversy surrounding its first application got quickly resolved by analytical 
results 11211 . the multicanonical approach addressed aggressively the problem of finding 
reliable weights. Nowadays a starter kit of Fortran programs is available on the Web lfl3ll . 

The replica exchange or parallel tempering (PT) method ifbH E5I1 uses a generalized 
ensemble which enlarges the canonical configuration space by allowing for the exchange 
of temperatures within a set of canonical ensembles. In this and other generalized en- 
sembles the Metropolis dynamics at low temperatures can be accelerated by excursions 
to disordered configurations at higher temperatures. Note that, in contrast to multicanon- 
ical simulations, the probabilities of canonically rare configurations are not enhanced by 
PT. For the purpose of simulating biomolecules PT was, e.g., studied in Ref. HE, O, Ell- 
in this article we discuss a biased updating scheme fl9ll . which enhances already 
the dynamics of canonical Metropolis simulations and is easily integrated into general- 
ized ensemble simulations too. The latter point is illustrated for parallel tempering. The 
biased updating scheme is inspired by the funnel picture of protein folding 112011 . At rel- 
atively high temperatures probability densities of the dynamical variables are estimated 
and used at lower temperatures to enhance the chance of update proposals to lie in the 
statistically relevant regions of the configuration space. In some sense this is nothing else 
but an elaboration of the original importance sampling concept of Metropolis et al.[3]. 

This article is organized as follows. All-atom protein models, the funnel picture and 
the rugged Metropolis updating are introduced in the next section. In the subsequent 
section numerical results are presented for the brain peptide Met-Enkephalin. A short 
summary and conclusions are given in the final section. 



ALL-ATOM PROTEIN MODELS 

Proteins are linear polymers of the 20 naturally occurring amino acids. Small proteins 
are called peptides. The problem of protein folding is to predict (at room temperature 
and in solvents) the 3d conformations (native structures) from the sequence of amino 
acids. A conformational energy function models the interactions between the atoms 
(units kcal/mol): 

Etoi = E es + E v dw + Ehb + Et 0K + E so \ . ( 1 ) 
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The contributions to the energy function are (charges are in units of the elementary 
charge): 
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The solvent accessible surface method allows for an approximative inclusion of solvent 
interactions: 

£ so i = Y.° iAi ' ( 6 ) 



where a,- is the solvation parameter for atom i and A, the conformation dependent solvent 
accessible surface area. 

The rij are the distances between the atoms and the OC/ are the torsion angles for 
the chemical bonds. The parameters Ay, fiy, Qj, Djj, U[ and ni are determined from 
crystal structures of amino acids and a number of thus obtained force fields are given in 
the literature. Bond length and angles fluctuate little and are normally set constant. The 
important degrees of freedom are the dihedral angles 0, i//, co and x, see FiglU 

The funnel picture of Bryngelson and Wolyness [20] gives qualitative insight into the 
process of protein folding, see Fig|2l where a schematic sketch of the free energy versus a 
suitable reaction coordinate is given. However, there is no generical definition of a good 
reaction coordinate, as the funnel lives in the high-dimensional configuration space. Here 
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FIGURE 2. Funnel picture 
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we give a parameter free funnel description lfl9ll from higher to lower temperatures, 
which suggests a method for designing the a-priori Metropolis weights for simulations 
of biomolecules. 

To be definite, we use the all-atom energy function lEHl ECEPP/2 (Empirical Confor- 
mational Energy Program for Peptides). Our dynamical variables v, are then the dihedral 
angles, each chosen to be in the range —it < v ; < %, so that the volume of the config- 
uration space is K = (2n) n . Let us define the support of a pd of the dihedral angles. 
The support of a pd is the region of configuration space where the protein wants to be. 
Mathematically, we define K p to be the smallest sub-volume of the configuration space 
for which 

p= I f[dvip{v h ...,v n ;T) (7) 
Jkp f = \ 

holds. Here < p < 1 is a probability, which ought to be chosen close to one, e.g., 
p = 0.95. The free energy landscape at temperature T is called rugged, if the support of 
the pd consists of many disconnected parts (this depends of course a bit on the adapted 
values for p and "many"). That a protein folds at room temperature, say 300^, into a 
unique native structure , . . . , v® means that its pd p(vi,... ,v„;300A^) describes small 
fluctuation around this structure. We are now ready to formulate the funnel picture in 
terms of pds 

Pr(vi,...,v„) =p(vi,...,v„;r r ), r= , (8) 

which are ordered by the temperatures T r , namely 

7i > T 2 > ... > T f . (9) 

The sequence © constitutes a protein funnel when, for a reasonable choice of the 
probability p and the temperatures ©, the following holds: 

1. The pds are rugged. 



2. The support of a pd at lower temperature is contained in the support of a pd at 
higher temperature 

K\ D K\ D ... D K p f , (10) 

e.g. for p = 0.95, T\ = 400K and T f = 300K. 

3. With decreasing temperatures T, the support Kf shrinks towards small fluctuations 
around the native structure. 

Properties 2 and 3 are fulfilled for many systems of statistical physics, when some 
groundstate stands in for the native structure. The remarkable point is that they may 
still hold for complex systems with a rugged free energy landscape, i.e., with property 1 
added. In such systems one finds typically local free energy minima, which are of negli- 
gible statistical importance at low temperatures, while populated at higher temperatures. 
In simulations at low temperatures the problem of the canonical ensemble approach is 
that the updating tends to get stuck in those local minima. This prevents convergence 
towards the native structure on realistic simulation time scales. On the other hand, the 
simulations move quite freely at higher temperatures, where the native structure is of 
negligible statistical weight. Nevertheless, the support of a protein pd may already be 
severely restricted, as we shall illustrate. The idea is to use a relatively easily calcula- 
ble pd at a higher temperature to improve the performance of the simulation at a lower 
temperature. 

The Metropolis importance sampling would be perfected, if we could propose new 
configurations {v-} with their canonical pd p (v[ , . . . , v' n ; T) . Due to the funnel property 2 
we expect that an estimate p(vi,.. .,v n ;T') from some sufficiently close-by higher 
temperature T' > T will feed useful information into the simulation at temperature T. 
The potential for computational gains is large because of the funnel property 3. The 
suggested scheme for the Metropolis updating at temperature T r is to propose new 
configurations {v^} with the pd p r _i(vj , . . . , v' n ) and to accept them with the probability 
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This equation biases the a-priori probability of each dihedral angle with an estimate of 



its pd from a higher temperature. In previous literature W22L 12311 such a biased updating 
has been used for the 4 theory, where it is efficient to propose (i) at each lattice size i 
with its single-site probability. 

For our temperatures T r the ordering © is assumed. With the definition 
p (vi, . . . ,v n ) = (2n)~ n the simulation at the highest temperature, T\, is performed 
with the usual Metropolis algorithm. We have thus a recursive scheme, called rugged 
Metropolis (RM) in the following. When p r _i {y\ , . . . , v n ) is always a useful approxima- 
tion of p r (vi, . . . ,v»), the scheme zooms in on the native structure, because the pd at Tf 
governs its fluctuations. 

To get things working, we need to construct an estimator p(vi, . . . , v n ; T r ) from the 
numerical data of the RM simulation at temperature T r . Although this is neither simple 
nor straightforward, a variety of approaches offer themselves to define and refine the 



desired estimators. In the following we work with the approximation 

p(vi,...,v n ;T r )=flp]( Vi ;T r ) (12) 
j=i 

where the p~) (v ; ; T r ) are estimators of reduced one-variable pds defined by 

p}( Vi -T) = [ + *HdVjp(y u ...,v n ;T). (13) 

J-* j# 

The implementation of the resulting algorithm, called RMi, is straightforward, as es- 
timators of the one-variable reduced pds are easily obtained from the time series of a 
simulation. The CPU time consumption of RMi is practically identical with the one of 
the conventional Metropolis algorithm. 



NUMERICAL RESULTS 

To illustrate the developed ideas, we rely on the brain peptide Met-Enkephalin, which 
is a numerically well-studied 11241 125L 1 1 Oi I26L 12711 . Met-Enkephalin is determined by the 
amino-acid sequence Tyr-Gly-Gly-Phe-Met or, in the short notation, Y-G-G-F-M, 
where 

Tyr (Y) — Tyrosine 
Gly(G) - Glycine 
Phe (F) — Phenylalanine 
Met (M) - Methionine 

Our simulations are performed with a variant of SMMP [28] (Simple Molecular Dy- 
namics for Protein) using fully variable CO torsion angles. For the data analysis we keep 
a times series by writing out configurations every 32 sweeps. 

Fig|3] show the probability densities of the dihedral angle V13 (for the notation see 
Tabled]) at 400 K and 300 K. This angle is chosen because it illustrates the possibility of 
large moves in the Metropolis updating, which jump barriers of a molecular dynamics 
simulation. Namely, a single update may take us directly from each of the three popu- 
lated regions to each other, whereas one encounters barriers when one has to move in 
small increments AV13. 

To evaluate the relative performance of different algorithms, we measure the inte- 
grated autocorrelation times T mt for the energy and each dihedral angle. The integrated 
autocorrelation times are directly proportional to the computer run times needed to 
achieve the same statistical accuracy. For an observable / the autocorrelations are 



C(t) = (foft)-(f} 2 



(14) 




where t labels the computer time. Defining c(t) = C(t)/C(0), the time-dependent inte- 
grated autocorrelation time is given by 

T int (0 = l+2£c(O. (15) 

f'=l 

Formally the integrated autocorrelation time T^t is defined by Ti nt = hm t ^ ao / ^ at (t). 
Numerically this limit cannot be reached as the noise of the estimator increases faster 
than the signal. Nevertheless, one can calculate reliable estimates by reaching a window 
of t values for which v mt (t) becomes flat, while its error bars are still reasonably small. 

Columns 5 and 6 of Table [T] collect our estimates of the integrated autocorrelation 
times from conventional, canonical Metropolis simulations at 400 K and 300 K. We 
find a remarkable slowing down. For the energy, which is characteristic for the over-all 
behavior, Ti nt increases by a factor of about ten. For certain angles, e.g. vio, Tint increases 
even by factors larger than twenty. At 400 K conventional Metropolis simulations allow 
to calculate observables with relative ease, while this is no longer the case at 300 K. Our 
aim is to use information from the 400 K simulation to improve on the performance at 
300 K. 

One finds that the energy histograms of the 400 K and the 300 K simulation have a 
considerable overlap, see Fig Bl Therefore, the two temperatures are well-suited to be 
combined into a PT iTfl Il5h simulation. We abstain here from introducing additional 
temperatures, because our aim is to get a clear understanding of the improvement of the 
300 K simulation due to PT input from 400 K. 

In Fig 13 integrated autocorrelations times for the energy variable are compared at 
300 K. The order of the curves agrees with the order in the figure legend. From 
up to down: Conventional canonical Metropolis simulation, RMi improved canonical 
Metropolis simulation with input from 400 K, PT simulation coupled to 400 K and the 
RMi improved PT simulation. The variable t of equation <IT3T> is given in units of 32 



TABLE 1. Integrated autocorrelation times. 
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3.38 (22) 
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1.07 (02) 
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1.12(04) 


20. 


x 3 


Met-5 


Met-5 


1.01 (01) 


1.05 (02) 


1.04(02) 


1.04(02) 


1.01 (01) 


21. 


x 4 


Met-5 


Met-5 


1.00 (01) 


1.01 (01) 


1.00 (01) 


1.01 (01) 


1.01 (01) 


| 22. 





Met-5 


Met-5 


2.77 (10) 


31.4 (2.9) 


20.8(1.9) 


14.9(1.1) 


9.16(76) | 


| 23. 





Met-5 


Met-5 


1.54 (05) 


21.4 (2.3) 


13.9(1.7) 


7.83 (48) 


3.73(19) | 


| 24. 


CO 


Met-5 


Met-5 


1.06 (01) 


1.14(02) 


1.03 (02) 


1.08 (01) 


1.03(02) | 




E 






4.98 (20) 


49.6 (5.0) 


26.2(1.6) 


19.9(1.6) 


9.94(60) | 



sweeps due to the way our data are recorded. In the range shown, i.e. up to t = 200 x 32 
sweeps, a window exist for the PT and the RMi improved PT simulations, which allows 
to estimate the integrated autocorrelation times of these simulations. For the other two 
simulations one has to go to even larger t values, but it remain possible to estimate Tj nt . 
For the energy and all 24 dihedral angles the Ti nt estimates are collected in Table [T] 

For the energy we find a decrease from Ti nt ~ 50 to Ti nt ~ 25 due to the RMi 
improvement of the canonical Metropolis simulation, i.e. approximately a factor of 2. 
The PT improvement of this simulation is even larger, namely from Ti nt ~ 50 to Ti nt ~ 
20, i.e. approximately a factor of 2.5. In both cases the CPU time spent at 400K 
is not part of the equation. This is justified as one should anyway understand the 





system first at temperatures where one does not suffer from long autocorrelation times. 
For PT the factor 2.5 is the improvement in real time when two identical PC nodes 
with some parallel software like MPI (Message Passing Interface) are available. Most 
remarkably, the two improvements multiply: For the RMi improved PT simulation the 
autocorrelation time is down to Ti nt « 10, only about one more factor of two away from 
the value at 400 K. 

Inspection of the Tj nt values for the 24 dihedral angles shows large differences. For 
the conventional canonical simulation the range is from 1 Tj nt xs 1 for V21 to Tj nt 200 for 



A value T; nt = 1 means that our resolution of measurements every 32 sweeps is too crude to show the 
autocorrelations, which one may still expect on the scale of a few sweeps. 



vi6. For the PT-RMi simulation it becomes reduced to a range from Ti nt ~ 1 to Ti nt ~ 20. 
This suggests that it is not efficient to simulate by sweeps, where each dihedral angle is 
updated once per sweep. Instead, an algorithm (systematic or random) where the number 
of updates per angle is proportional to its integrated autocorrelation time is expected 
to be more efficient. Obviously, this can be implemented without major changes of 
the existing code, but tests of this idea have not yet been completed. After all these 
improvements the 300 K simulation is expected to be no more autocorrelated than the 
conventional, canonical simulation at 400 K. 

All-atom Metropolis simulations of small biomolecules deserve a place on their own 
in the arsenal of computational biophysics. A good understanding of smaller pieces 
of a larger protein is one of the ingredients, which can pave the way towards the 
understanding of large systems like proteins. Major algorithmic problems remain to be 
solved before Metropolis simulations of all-atom models may become directly suitable 
for applications like the folding of small to medium sized proteins. 

One algorithmic problem is that of correlated moves of two or more dihedral angles. 
Such moves promise to overcome (jump) essential free energy barriers in the case of 
larger molecules. Within conventional canonical Metropolis simulations the acceptance 
rates for simultaneous moves of two and more angles are prohibitively small. The RM 
concept promises major inroads, but details have not yet been tested out. 

An even more challenging problem is the inclusion of solvent effects. Here the 
large moves of Metropolis updates create the cavity problem. We need a cavity in the 
surrounding water to accommodate the move and updates of the dihedral angles do not 
create one. This appears to be one of the major reasons why large scale simulations 
of proteins with a surrounding solvent are to an overwhelming extent done within the 
MD framework, where the cavity problem is avoided due to joint, small moves of all 
degrees of freedoms. Within the Metropolis approach effective solvent models may 
provide a solution. They are, e.g., defined by a solvent contribution like E so ] in Eq.©. 
Unfortunately, no reliable determination of the parameters entering this equation exists 
presently in the literature, as was demonstrated by recent simulations [ 29l 3^\. However, 
it appears to be within the reach of Metropolis simulations to lead the way towards 
determining reliable parameterizations. 



SUMMARY AND CONCLUSIONS 

• The RMi approximation to the rugged Metropolis (RM) method lfl9ll leads already 
to considerable improvements over conventional Metropolis simulations of Met- 
Enkephalin at 300 K. As RMi is easily implemented and needs no additional 
computer time, it should be used whenever Metropolis simulations of suitable 
systems are done. 

• For larger systems one-variable moves alone will not work due to correlations be- 
tween the dihedral angles. The RM approach promises sufficiently large acceptance 
rates for multi- variable moves. 

• Ultimately, each biomolecule of interest may need its own, specifically designed, 
Metropolis algorithm. This task includes to determine reliable parameters for a 



solvent model like the one of Eq.@. 
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